Classification with model and localization uncertainty

ABSTRACT

Methods and systems are provided for fusing responses of a classifier that provides a model uncertainty measure, while accounting for viewpoint-dependent variations in object appearance and correlations in classifier responses, and accounting for localization uncertainty.

FIELD OF THE INVENTION

The present invention relates to robotic perception and objectclassification.

BACKGROUND

Object detection and classification is a component of situationalawareness important to many autonomous systems and robotic tasks. Themobility of robotic systems is widely exploited to overcome limitationsof static, one-point-of-view approaches to measurement classification.Limitations may include problems such as occlusions, class aliasing (dueto classifier imperfections or objects that appear similar from certainviewpoints), imaging problems, and false detections. By accumulatingclassification evidence across multiple observations and viewpoints,uncertainty can be reduced.

Variations in object appearance are often directly addressed usingoffline-built class models for inference rather than raw classifiermeasurements. Especially in active methods, such models are oftenthemselves spatial and view-dependent. View-dependent models can allowfor better fusion of classifier measurements by modelling correlationsamong similar viewpoints instead of requiring the common but usuallyfalse assumption of independence of measurements. See, for example: W TTeacy, et al., “Observation modelling for vision-based target search byunmanned aerial vehicles,” Intl. Conf. on Autonomous Agents andMultiagent Systems,” AAMAS, pp. 1607-1614, 2015, hereinbelow, “Teacy”;Javier Velez, et al., “Modelling observation correlations for activeexploration and robust object detection,” J. of Artificial IntelligenceResearch, 2012, hereinbelow, “Velez”.

Reliance on spatial models, however, introduces new problems, as robotlocalization is usually not precisely resolved, leading to errors whenmatching measurements against the model. This is aggravated in thepresence of classifier measurements that do not comply with the model,as may happen when a classifier is deployed in an environment differentin appearance from the one on which it was trained on; for example, in adifferent country where objects semantically identical to the ones inthe training set look different. In the latter case, classifier outputwould often be arbitrary, rather than reflecting actual uncertainty,known as model uncertainty, as described by Yarin Gal and ZoubinGhahramani, “Dropout as a Bayesian approximation: Representing modeluncertainty in deep learning,” Intl. Conf. on Machine Learning (ICML),2016 (hereinbelow, “Gal-ICML”). In the domain of Bayesian deep learning,methods exist to approximate the above uncertainty as a networkposterior, as described by and Pavel Myshkov and Simon Julier,“Posterior distribution analysis for Bayesian inference in neuralnetworks,” Advances in Neural Information Processing Systems (NIPS),2016. One method is based on test-time dropout, as described by YarinGal and Zoubin Ghahramani, “Bayesian convolutional neural networks withBernoulli approximate variational inference,” arXiv preprintarXiv:1506.02158, 2016 (hereinbelow, “Gal-ArXiv”). Dropout permits thecalculation of network posteriors for virtually any deep learning-basedclassifier, without requiring a change in the model.

Visual classification fusion methods can be split into methods directlyusing classifier scores, and methods matching classifier measurements toa statistical model, or fusing them using a specially trainedclassifier. The rationale for using a class model rather than individualclassifier measurements lies in the variation in object appearance andbackground with viewpoint, which cannot always be correctly captured bythe classifier, as well as situations where training data is notrepresentative of the test data, e.g. where a classifier was not orcannot be retrained specifically for the domain where it is deployed andtherefore its responses cannot be directly relied upon.Viewpoint-dependent object appearance (and hence, classifier response)may be analyzed as classifier noise (e.g. Omidshafiei, et al.,“Hierarchical Bayesian noise inference for robust real-timeprobabilistic object classification,” preprint arXiv:1605.01042, 2016,hereinbelow “Omidshafiei”). Alternatively, the viewpoint-dependentobject appearance may be modeled as spatial variation directly.

A common assumption in many classification methods is that ofstatistical independence of individual class measurements. Thisassumption is generally false, e.g. observations from the same orsimilar poses are likely to be extremely similar, resulting in similarclassifier responses. Considering these observations as independentleads to an overly-confident posterior. Velez and Teacy deal with thisby learning Gaussian Process (GP) regressors to describe both per-classspatial variation in classifier responses and spatial statisticaldependence.

Another often-violated common assumption is that of known robotlocalization. This is specifically the weakness of methods modellingspatial variation of classifier responses, as localization error mayintroduce class aliasing when matching classifier responses against aspatial model.

Monte-Carlo dropout can be interpreted as an approximation to modeluncertainty of a Bayesian neural network classifier, expressed asposterior distribution of the output for Gaussian priors on weights(Gal-Arxiv). Model uncertainty quantifies the reliability of classifierresponses for given input data, complementing the classifier responsevector (softmax output). While there are other ways of approximatingnetwork posterior and other reliability measures, MC dropout ispractical because it requires no change in architecture or computationsthat are not otherwise part of the model.

Existing classification fusion methods do not address model uncertainty.Indeed, with few exceptions most current methods discard theclassification vector commonly output by the classifier, only using themost likely class (i.e., the component with the highest response) forbelief updates. Likewise, most methods ignore uncertainty inlocalization, assuming it perfectly known.

SUMMARY OF THE INVENTION

Embodiments of the present invention provide methods and systems for (a)fusing responses of a classifier that provides a model uncertaintymeasure, while (b) accounting for viewpoint-dependent variations inobject appearance and correlations in classifier responses, and (c)accounting for localization uncertainty. Simulation confirms that theprocess is robust with respect to the sources of uncertainty comparedwith other methods. By fusing responses of a classifier that provides amodel uncertainty measure, while accounting for viewpoint-dependentvariations in object appearance and correlations in classifierresponses, and accounting for localization uncertainty, the methods andsystem provided by the present invention identify which of multipleknown Gaussian processes (GPs) is the most likely origin of measurementsof an object viewed from different viewpoints.

There is therefore provided by embodiments of the present invention, amethod of classifying an object and determining an uncertainty of theobject classification, wherein the object appears in multiple sequentialmeasurements of a scene, the method including: fusing responses of aclassifier that provides a model uncertainty measure, while accountingfor viewpoint-dependent variations in object appearance and correlationsin classifier responses, and accounting for localization uncertainty, toprovide comparative object classification distributions. The multiplesequential measurements may be measurements such as images, laser scans,or other measurements known in the field of robotic perception. Themeasurements of the scene are typically acquired by a robot moving alonga given trajectory, and no initial localization information is providedto the classifier regarding the robot's coordinates or orientation, orregarding the object's coordinates or orientation.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the invention, reference is made tothe following description and accompanying drawings, in which:

FIG. 1 is a schematic diagram of a scenario indicating labeling of anobject class from measurements of an object viewed from multiple poses,according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a scenario indicating a robot acquiringobservations while moving along a path in the vicinity of an object ofinterest, according to an embodiment of the present invention;

FIGS. 3A-C are graphs of model certainty while measurements are madeduring the course of a robot movement along a path in the vicinity of anobject of interest, according to an embodiment of the present invention;

FIGS. 4A-F are graphs of simulation results comparing the methods of thepresent invention with other methods, indicating result metrics ofprobability of correct class, mean likely-to-ground truth ratio (MGR),and mean squared detection error (MSDE), according to an embodiment ofthe present invention;

FIGS. 5A-C are graphs of ground truth class and simulated (noised)classifier measurements from multiple sequential poses, withlocalization bias, according to an embodiment of the present invention;

FIGS. 6A-F are graphs of simulation results with localization bias,comparing results of the present invention with other methods,indicating result metrics of probability of correct class, MGR and MSDE,according to an embodiment of the present invention;

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Embodiments of the present invention provide methods for objectclassification of objects viewed from multiple poses, where theclassification includes measures of classifier model uncertainty, robotlocalization uncertainty, and spatial correlation.

Problem Formulation

Consider a robot traversing an unknown environment, taking observationsof different scenes. Robot motion between times t_(k) and t_(k+1) isinitiated by a control input u_(k), that may originate from a humanuser, or be determined by a motion planning algorithm. Let the robotpose at time instant k be denoted as x_(k), and define X_(0:k)={x₀, . .. , x_(k)} as the sequence of poses up to that time.

Let

_(k)={

_(0:k-1),

_(0:k)} represent the history, comprising observations

_(0:k)={z₀, . . . , z_(k)} and controls

_(0:k-1)={u₀, . . . , u_(k−1)} up until time k. The goal is to classifya single object belonging to one of N_(c) known classes, denoted byindexes

={1, . . . , N_(c)}.

The classification posterior, or belief, at time instant k is:

b[c _(k)]≐

(c|

_(k)).  (1)

The classification posterior is the probability that the object inquestion belongs to a class c∈

, given all measurements and user controls up to time k. In calculatingthis posterior we want to take into account spatial correlation amongmeasurements, model uncertainty, as well as uncertainty in the positionsfrom which these measurements are taken (i.e., localizationuncertainty).

Classifier Model

Commonly, output of a classifier can be interpreted as a categoricaldistribution over classes (e.g. by applying softmax to its outputs).However, high responses may be unstable, specifically, when inputs arefar from training data. The technique proposed by Gal (ICML) may be usedto obtain an approximation for the model uncertainty of the neuralnetwork classifier. By this technique, for every classifier input,several forward passes are performed, applying random dropouts at eachpass, to obtain a set of classification vectors. Subsequently, theuncertainty is characterized, yielding classifier output as shown inFIG. 2. Typically, the robot has at its disposal an object classifierunit, which, given observation z_(k) (e.g., an image or measurement,such as a laser scan), calculates a set of outputs S_(k) ≙{s_(k)}, whereeach output s_(k)∈

^(N) ^(c) ^(×1) represents a categorical belief over the class of theobserved object, i.e. Σ_(i=1) ^(N) ^(c) s_(k) ^((i))=1. The set S_(k)can be interpreted as an approximation to the distribution:

(s|z _(k)),  (2)

carrying information of the classifier's model uncertainty for the giveninput z_(k).

Viewpoint-Dependent Class Model

For the class likelihood we use a model similar to the one proposed byTeacy. For a single classifier measurement s (categorical vector) madefrom relative pose x^((rel)), the class likelihood is a probabilisticmodel:

(s|c,x _(k) ^((rel))),  (3)

where c∈

is the object class, and the k subscript denotes a time index. Denotingobject pose in global frame as o we can explicitly write:

x _(k) ^((rel)) ≐x _(k) ⊖o.  (4)

The dependence of the model in Eq. (3) on viewpoint naturally capturesview-dependent variations in object appearance. Furthermore, toincorporate the notion that similar views tend to yield similarclassifier responses and in particular, are not independent, the jointdistribution:

(S_(0:k) |c,

_(0:k) ^((rel))),  (5)

is assumed to characterize the classification unit's outputs S_(0:k){S₀, . . . , S_(k)} when an object of class c is viewed from a sequenceof relative poses

_(0:k) ^((rel))≐{x₀ ^((rel)), . . . , x_(k) ^((rel))}. As described byboth Teacy and of Velez, this joint distribution can be represented as aGaussian Process, learned using a classifier unit. Explicitly, we modela training set classifier response when viewing an object of class cfrom a relative pose x^((rel)) as:

s ^((i)) =f _(i|c)(x ^((rel)))+ε,  (6)

where the i index denotes component i of classification vector s,ε˜N(0,σ_(n) ²) i.i.d. noise, and (dropping the (rel) superscript forclarity):

f _(i|c)(x)˜

(μ_(i|c)(x),k _(i|c)(x,x)),  (7)

where μ_(i|c) and k_(i|c) are the mean and covariance functions definingthe GP:

μ_(i|c)(x)=

{s ^((i)) |c,x}  (8)

k _(i|c)(x,x′)=

{(f _(i|c)(x)−μ_(i|c)(x))(f _(i|c)(x′)−μ_(i|c)(x′))}  (9)

The classification vector for each class c is modeled with independent,per-component GP's. Note also the Gaussian approximation of thedistribution of the classification vector, which resides in the simplex(other representations exist, which however are not readily interpretedas a spatial model). For the covariance, the squared exponentialfunction is applied:

k _(i|c)(x,x′)=σ_(i|c) ² exp(−½(x−x′)^(T) L _(i|c) ⁻¹(x−x′)),  (10)

where σ_(i|c) ² e is the variance, and L_(=i|c) is the length scalematrix, determining the rate of the covariance decay with distance.These parameters can be learned from training data, however in oursimulations they were set by hand.

Denote the training set for class c as {S_(T) ^(c),X_(T) ^(c)+}, withS_(T) ^(c) being classifier measurements, and X_(T) ^(c) thecorresponding poses, and denote (test-time) measurements as S=S_(0:k)and X=

_(0:k) ^((rel)). Furthermore, the equations Eqs. (11-14) all hold pervector-component, joined in Eq. (15). To simplify notation we drop the iindex in S^((i)), S_(T) ^((i)) and k_(i|c).

As described in C. E. Rasmussen and C. K. I. Williams, “GaussianProcesses for Machine Learning,” MIT press, Cambridge, Mass., 2006(hereinbelow, “Rasmussen”), we model the joint distribution ofclassifier measurements (per-component) for object of class c as:

$\begin{matrix}{{{{\mathbb{P}}\left( {S_{T}^{c},\left. S \middle| c \right.,X_{T}^{c},X} \right)} = {N\left( {0,\begin{bmatrix}{{K_{c}\left( {X_{T}^{c},X_{T}^{c}} \right)} + {\sigma_{n}^{2}I}} & {K_{c}\left( {X_{T}^{c},X} \right)} \\{K_{c}\left( {X,X_{T}^{c}} \right)} & {K_{c}\left( {X,X} \right)}\end{bmatrix}} \right)}},} & (11)\end{matrix}$

where K_(c) is the matrix produced by application of kernel k_(c) on allpairs of input vectors.

We thus obtain the conditional distribution for classifier measurementsof object of class c:

(S_(0:k) |c,X _(T) ^(c) ,S _(T) ^(c),

_(0:k) ^((rel)))=N(μ,Σ),  (12)

with:

μ=K _(c)(X,X _(T) ^(c))·H·S  (13)

Σ=K _(c)(X,X)−K _(c)(X,X _(T) ^(c))·H·K _(c)(X _(T) ^(c) ,X),  (14)

and where H≐(K_(c)(X_(T) ^(c),X_(T) ^(c))+σ_(n) ²I)⁻¹.

We finalize the equation by combining the per-component models into ajoint class likelihood as:

$\begin{matrix}{{{\mathbb{P}}\left( {\left. S \middle| c \right.,X_{T}^{c},S_{T}^{c},X} \right)} = {\prod\limits_{i}{{\mathbb{P}}\left( {\left. S^{(i)} \middle| c \right.,X_{T}^{c},S_{T}^{c,{(i)}},X} \right)}}} & (15)\end{matrix}$

This approach differs from the approach described by Teacy, whereinference from training data is being performed by offline learning ofGP mean and covariance functions rather than by using a jointdistribution as in Eq. (11).

To account for both localization and model uncertainty we rewrite Eq.(1) as marginalization over latent robot and object poses, and overclassifier outputs. Marginalizing over robot pose history and objectpose gives the following equations:

b[c _(k)]=

(c|

_(k))=∫

_(0:k) _(,o)

(c,

_(0:k) ,o|

_(k))d

_(0:k) do,  (16)

which, using a chain rule, can be written as

$\begin{matrix}{{b\left\lbrack c_{k} \right\rbrack} = {\int_{\chi_{{0:k},o}}{\underset{\underset{(a)}{︸}}{{\mathbb{P}}\left( {\left. c \middle| \chi_{0:k} \right.,o,\mathcal{H}_{k}} \right)}\underset{\underset{(b)}{︸}}{{\mathbb{P}}\left( {\chi_{0:k},\left. o \middle| \mathcal{H}_{k} \right.} \right)}d\;\chi_{0:k}{{do}.}}}} & (17)\end{matrix}$

Term (a) above is the classification belief given relative poses

_(0:k) ^((rel)) which are calculated from

_(0:k) and o via Eq. (4). Term (b) represents the posterior over

_(0:k) and o given observations and controls thus far. As such, thisterm can be obtained from existing SLAM approaches. One can furtherrewrite the above equation as

$\begin{matrix}{{{b\left\lbrack c_{k} \right\rbrack} = {\underset{\chi_{0:k},o}{\mathbb{E}}\left\{ {{\mathbb{P}}\left( {\left. c \middle| \chi_{0:k}^{({rel})} \right.,\mathcal{H}_{k}} \right)} \right\}}},} & (18)\end{matrix}$

where the expectation is taken with respect to the posterior p(

_(0:k), o|

_(k)) from term (b).

Hereinbelow, we assume that object orientation relative to the robot isknown (leaving o with 3 degrees of freedom), and so this posterior canbe computed using SLAM methods (described below with respect tolocalization-inference), which commonly model this posterior with aGaussian distribution. We then use the obtained distribution toapproximate the expectation in Eq. (18) using sampling.),

In the following we detail the computation of the terms (a) and (b) ofEq. (17).

Classification Under Known Localization

Here we describe a method of updating the classification belief given aknown pose history, which is term (a) in Eq. (17), when receiving newmeasurements at time step k, while accounting for correlations withprevious measurements and model uncertainty.

To simplify notation, we shall denote the history of observations,controls and (known) relative poses as

H _(k)≐

_(k)∪

_(0:k) ^((rel))≡{

_(0:k-1),

_(0:k),

_(0:k) ^((rel))}.  (19)

We start by marginalizing term (a) over model uncertainty in theclassifier output at time k

(c|H _(k))=∫_(s) _(k)

(c|s _(k) ,H _(k))·

(s _(k) |H _(k))ds _(k).  (20)

Assuming s_(k) carries the class information from measurement z_(k), andthat s_(k)˜p(s_(k)|z_(k)) we can rewrite this as

(c|H _(k))=∫_(s) _(k)

(c|s _(k) ,H _(k) \{z _(k)})·

(s _(k) |z _(k))ds _(k).  (21)

In our case, {s_(k)} are samples from p(s_(k)|z_(k)), so we canapproximate the integral as

$\begin{matrix}{{{\mathbb{P}}\left( c \middle| H_{k} \right)} \approx {\frac{1}{n_{k}}{\sum\limits_{s_{k} \in \mathcal{S}_{k}}{{{\mathbb{P}}\left( {\left. c \middle| s_{k} \right.,{H_{k} \smallsetminus \left\{ z_{k} \right\}}} \right)}.}}}} & (22)\end{matrix}$

To calculate the summand, we apply Bayes' law and then smooth over classin the denominator

$\begin{matrix}{{{\mathbb{P}}\left( c \middle| H_{k} \right)} \approx {\sum\limits_{s_{k}}{\frac{\eta\left( s_{k} \right)}{n_{k}} \cdot {{\mathbb{P}}\left( {\left. s_{k} \middle| c \right.,{H_{k} \smallsetminus \left\{ z_{k} \right\}}} \right)} \cdot {{\mathbb{P}}\left( c \middle| {H_{k} \smallsetminus \left\{ z_{k} \right\}} \right)}}}} & (23)\end{matrix}$with

$\begin{matrix}{{\eta\left( s_{k} \right)}{\overset{.}{=}{1/{\sum\limits_{c \in \mathcal{C}}{{{\mathbb{P}}\left( {\left. s_{k} \middle| c \right.,{H_{k} \smallsetminus \left\{ z_{k} \right\}}} \right)}{{{\mathbb{P}}\left( c \middle| {H_{k} \smallsetminus \left\{ z_{k} \right\}} \right)}.}}}}}} & (24)\end{matrix}$

Note that the denominator in η(s_(k)) is a sum of numerator (summand)terms in Eq. (23) for the different classes and can be computedefficiently (but cannot be discarded altogether due to the dependence ons_(k)). Further, note that

$\begin{matrix}{{{\mathbb{P}}\left( c \middle| {H_{k} \smallsetminus \left\{ z_{k} \right\}} \right)} = {{\mathbb{P}}\left( {\left. c \middle| \mathcal{X}_{0:k}^{({rel})} \right.,\mathcal{Z}_{0:{k - 1}}} \right)}} & {{~~~~~~~~~~~~~~~~~~~~~~~~}(25)} \\{= {{{\mathbb{P}}\left( {\left. c \middle| \mathcal{X}_{0:{k - 1}}^{({rel})} \right.,\mathcal{Z}_{0:{k - 1}}} \right)} = {{{\mathbb{P}}\left( c \middle| H_{k - 1} \right)}.}}} & {(26)}\end{matrix}$

As

(c|H_(k−1)) has been computed in the previous step, we are left tocompute the class likelihood term

(s_(k)|c,H_(k)\{z_(k)}). This term involves past observations

_(0:k-1) but not classifier outputs S_(0:k-1), which need to beintroduced to account for spatial correlation with s_(k) using Eq. (5).Marginalizing over S_(0:k-1) (recall that in our notationS_(0:k-1)∪{s_(k)}=S_(0:k)) yields

$\begin{matrix}{{{{\mathbb{P}}\left( {\left. s_{k} \middle| c \right.,{H_{k} \smallsetminus \left\{ z_{k} \right\}}} \right)} = {{\int\limits_{\mathcal{S}_{0:{k - 1}}}{{{\mathbb{P}}\left( {\left. \mathcal{S}_{0:k} \middle| c \right.,{H_{k} \smallsetminus \left\{ z_{k} \right\}}} \right)}{d\mathcal{S}}_{0:{k - 1}}}} = {\int_{\mathcal{S}_{0:{k - 1}}}{{{{\mathbb{P}}\left( {\left. s_{k} \middle| c \right.,\mathcal{S}_{0:{k - 1}},{H_{k} \smallsetminus \left\{ z_{k} \right\}}} \right)} \cdot {{\mathbb{P}}\left( {\left. \mathcal{S}_{0:{k - 1}} \middle| c \right.,\ {H_{k} \smallsetminus \left\{ z_{k} \right\}}} \right)}}{d\mathcal{S}}_{0:{k - 1}}}}}},} & (27)\end{matrix}$

where we applied smoothing to separate between past classifier outputsS_(0:k-1) for which observations

_(0:k-1) are given and the current output s_(k). The first term in theproduct reduces to

(s_(k)|c,S_(0:k-1),

_(0:k) ^((rel))), a conditioned form of the class model Eq. (12) (andthus Gaussian, which we treat explicitly later in Eq. (30) and on). Thisterm represents the probability to obtain classification s_(k) whenobserving an object of class c from relative pose x_(k) ^((rel)) givenprevious classification results and relative poses. The second term inEq. (27) can be approximated using Eq. (2) for the individualobservations z_(i), i.e.

${{\mathbb{P}}\left( {\left. \mathcal{S}_{0:{k - 1}} \middle| c \right.,{H_{k} \smallsetminus \left\{ z_{k} \right\}}} \right)} = {{{\mathbb{P}}\left( \mathcal{S}_{0:{k - 1}} \middle| \mathcal{Z}_{0:{k - 1}} \right)} \approx {\prod\limits_{i = 0}^{k - 1}{{\mathbb{P}}\left( s_{i} \middle| z_{i} \right)}}}$

Note that class c and poses

_(0:k-1) ^((rel)), both members of H_(k) can be omitted sinceconditioning on observations determines classifier outputs up touncertainty due to classifier intrinsics (model uncertainty). Theapproximation is in the last equality, since in general classifieroutputs s₀, . . . , s_(k−1) are interdependent through classifierparameters. We can now rewrite

(s_(k)|c,H_(k)\{z_(k)}) from Eq. (27) as

$\begin{matrix}{\int\limits_{\mathcal{S}_{0:{k - 1}}}{{{\mathbb{P}}\left( {\left. s_{k} \middle| c \right.,\mathcal{S}_{0:{k - 1}},\mathcal{X}_{0:k}^{({rel})}} \right)} \cdot {\prod\limits_{i = 0}^{k - 1}{{{\mathbb{P}}\left( s_{i} \middle| z_{i} \right)}{{d\mathcal{S}}_{0:{k - 1}}.}}}}} & (28)\end{matrix}$

Assuming classifier output Eq. (2) is Gaussian, we denote

(s _(i) |z _(i))=N(μ_(z) _(i) ,Σ_(z) _(i) )  (29)

where μ_(z) _(i) and Σ_(z) _(i) are estimated from S_(i). Since classmodel is Gaussian, see Eq. (12), the first term in the integrand in Eq.(28) is a Gaussian that we denote as

(s _(k) |c,S _(0:k-1),

_(0:k) ^((rel)))=N(μ_(k|0:k-1),Σ_(k|0:k-1))  (30)

where, utilizing standard Gaussian Process equations (see, for example,C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for MachineLearning, The MIT press, Cambridge, Mass., 2006.):

μ_(k|0:k-1)=μ_(k)+Ω·(S_(0:k-1)−μ_(0:k-1))  (31)

Σ_(k|0:k-1) =K(x _(k) ,x _(k))−Ω·K(

_(0:k-1) ,x _(k))  (32)

with Ω≐K(x_(k),

_(0:k-1))K(

_(0:k-1),

_(0:k-1)).

Using these relations, the integrand from Eq. (28) is a Gaussiandistribution over S_(0:k), that can be inferred as follows.

$\begin{matrix}{{{{{\mathbb{P}}\left( {\left. s_{k} \middle| c \right.,\mathcal{S}_{0:{k - 1}},\mathcal{X}_{0:k}^{({rel})}} \right)} \cdot {\prod\limits_{i = 0}^{k - 1}{{\mathbb{P}}\left( s_{i} \middle| z_{i} \right)}}} = {{\eta exp}\left\{ {{- \frac{1}{2}}\left( {{{s_{k} - \mu_{{k|0}:{k - 1}}}}_{\Sigma_{{k|0}:{k - 1}}}^{2} + {\sum\limits_{i = 0}^{k - 1}{{s_{i} - \mu_{z_{i}}}}_{\Sigma_{z_{i}}}^{2}}} \right)} \right\}}},} & (33)\end{matrix}$

where η only depends on

_(0:k) ^((rel)). Using Eq. (31) we can write

$\begin{matrix}{{s_{k} - \mu_{{k|0}:{k - 1}}} = {s_{k} - \mu_{k} - {\Omega \cdot \left( {\mathcal{S}_{0:{k - 1}} - \mu_{0:{k - 1}}} \right)}}} & {{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}(34)} \\{= {\begin{bmatrix}{- \Omega} & I\end{bmatrix}\left( {\mathcal{S}_{0:k} - \mu_{0:k}} \right)}} & {(35)}\end{matrix}$

We have that the integrand Eq. (33) from Eq. (28) is proportional to ajoint Gaussian distribution N(μ_(J),Σ_(J)) with

$\begin{matrix}{\Sigma_{J} = \left( {\Sigma_{s}^{- 1} + \Sigma_{z}^{- 1}} \right)^{- 1}} & (36) \\{{\mu_{J} = {\Sigma_{J}^{- 1}.\ \left( {{\Sigma_{s}^{- 1}\mu_{s}} + {\Sigma_{z}^{- 1}\mu_{z}}} \right)}},} & (37) \\{{\mu_{s} = {{\begin{pmatrix}\mu_{0} \\\vdots \\\mu_{k - 1} \\\mu_{k}\end{pmatrix}\mspace{23mu}\mu_{z}} = \begin{pmatrix}\mu_{z_{0}} \\\vdots \\\mu_{z_{k - 1}} \\0\end{pmatrix}}},} & (38) \\{\Sigma_{s}^{- 1} = {\begin{bmatrix}{- \Omega} & I\end{bmatrix}^{T}{\Sigma_{{k|0}:{k - 1}}^{- 1}\begin{bmatrix}{- \Omega} & I\end{bmatrix}}}} & (39) \\{\Sigma_{z}^{- 1} = \ \begin{pmatrix}\Sigma_{z_{0}}^{- 1} & 0 & \ldots & 0 \\0 & \ddots & \; & \vdots \\\vdots & \; & \Sigma_{z_{k - 1}}^{- 1} & 0 \\0 & \ldots & 0 & 0\end{pmatrix}} & (40)\end{matrix}$

Finally, the class likelihood from Eq. (27) is the marginal distributionof the above. Specifically, the integral is directly calculated byevaluation at s_(k) of a Gaussian PDF with the components correspondingto s_(k) from μ_(J) and Σ_(J) as mean and covariance. The above stepsshow the calculation of term (a) of Eq. (17), for updating the classbelief given known localization, upon arrival of new measurements. Thenext section describes the calculation of term (b) of Eq. (17), the“localization” belief term.

Localization Inference

We assume that object orientation relative to the robot is known(perhaps, detected from observations

), and so o has three degrees of freedom (location). Hence, term (b) ofEq. (17) is essentially a SLAM problem with the robot pose history

_(0:k) and one landmark, the target object pose o, to be inferred.Specifically, we can express the target distribution as marginalizationover all landmarks

, except the object of interest

$\begin{matrix}{{{\mathbb{P}}\left( {\mathcal{X}_{0:k},\left. o \middle| \mathcal{H}_{k} \right.} \right)} = {\int_{\mathcal{L}}{{{\mathbb{P}}\left( {\mathcal{X}_{0:k},o,\left. \mathcal{L} \middle| \mathcal{H}_{k} \right.} \right)}{{d\mathcal{L}}.}}}} & (41)\end{matrix}$

This can be computed using state of the art methods such as iSAM2, asdescribed by M. Kaess, et al., “iSAM2: Incremental smoothing and mappingusing the Bayes tree,” Intl. J. of Robotics Research, 31:217-236,February 2012.

To review, the procedure of the present invention includes the followingsteps. Inputs include 1) a “belief” (i.e., uncertain estimate of) robottrack and the environment, S_(0:k-1), which is a vector of s^(nominal)from previous time steps. and 2) observations from current time step.The procedure is run at each time step.

Input: Localization posterior  

 ( 

 _(0:k), o |  

 _(k)), new observations  

 _(k)  1: S_(k) ={s_(k) ^((i))}_(i=1) ^(n) ^(k) ← CLASSIFYWITHDROPOUT( 

 _(k)), pre-calculate Σ_(z) ⁻¹, μ_(z)  

 Uncertain classifier measurements are obtained from the rawobservations.  2: Sample { 

 _(0:k) ^((i)), o^((i))}_(i=1) ^(n) ^(x) ~  

 ( 

 _(0:k), o |  

 _(k))  

 Samples are taken from the trajectory and environment estimates.  3:for  

 _(0:k), o ∈ { 

 _(0:k), o}do  

 Classification is computed, given the trajectory and environmentsample/realization at step 12.  4:  ∀c ∈ C calculate Σ_(J), μ_(J)  

 These are per-class through GP model  5:  for s_(k) ∈ S_(k) do  6:  for c ∈  

  do  7:     

 (s_(k) | c,  

 _(k) \ { 

 _(k)}) ← N(s_(k); μ_(J) ^((k)) , Σ_(J) ^((k,k)))  

 Likelihood given past observations, Eq. (27)  8:    {tilde over(h)}^(c) ←  

 (s_(k) | c, H_(k) \ { 

 _(k)} ·  

 (c | H_(k−1))  

 Unnormalized class likelihood, Eq. (23)  9:   end for 10:    

 (c | s_(k), H_(k)) ← {tilde over (h)}^(c)/η(s_(k)), η(s_(k)) =Σ_(c){tilde over (h)}^(c)  

 Normalize class likelihood, Eqs. (22, 23). Eq. (23) 11:  end for 12:   

 (c | H_(k))^((i)) ← Σ_(s) _(k)  

 (c | s_(k), H_(k))n_(k)  

 Marginalization is performed over the trajectory/environment samples.  

 Classification given localization, Eq. (18) 13: end for 14:  

 (c |  

 _(k)) ← Σ_(i)  

 (c | H_(k))^((i)) /n_(x)

Results

As described above, embodiments of the present invention provide methodsfor classification of an object that is viewed from multiple directions.The method includes fusing classifier measurements which include a modeluncertainty measure, and explicitly accounting for viewpoint-variabilityand spatial correlations of classifier outputs, as well as uncertaintyin the localization. Simulation experiments described in this sectionconfirm increased robustness to the above sources of uncertainty ascompared to current methods. In particular, statistical analysissuggests that in cases where other methods inferred a wrong class withhigh confidence due to noisy measurements of class or location, themethod of the present invention reported high uncertainty, and wasgenerally able to accumulate classification evidence.

In the experimental simulation described below, performed in MATLAB,classifier measurements were generated using the GP model of the groundtruth class, along a predetermined track, as shown in FIG. 2.

The class inference algorithm needs to fuse these measurements into aposterior over classes, essentially identifying which of the known GPmodels is the most likely origin of the measurements. We studyrobustness of the GP model to model and localization uncertainty, andcompare the results with results obtained from other methods.

Comparison of Approaches and Performance Metrics

We compare the results of three methods. The method of the presentinvention is denoted as Model with Uncertainty, which takes into accountspatial correlations, as well as uncertainty in pose and classifiermodel uncertainty. The second method is Model Based, similar to themethod described by Teacy but with GP defined as in Eq. (11), whichtakes into account spatial correlation, but not uncertainties. The thirdmethod is Simple Bayes, which directly uses the classifier scores andassumes spatial independence between observations, as in, for example,T. Patten, et al., “Viewpoint evaluation for online 3-d active objectclassification,” IEEE Robotics and Automation Letters (RA-L),1(1):73-81, January 2016.

We compare these methods with relation to the following metrics: (i)probability of ground-truth class; (ii) mean squared detection error;and (iii) most likely-to-ground truth ratio.

The mean squared detection error (MSDE) is defined as

$\begin{matrix}{{MSDE}{\overset{.}{=}{\frac{1}{N_{c}}{\sum\limits_{c^{\prime} \in C}\left( {{\delta_{c}\left( c^{\prime} \right)} - {{\mathbb{P}}\left( {c^{\prime}❘\mathcal{H}} \right)}} \right)^{2}}}}} & (42)\end{matrix}$

where c is the ground truth class and δ_(c)(c′) is 1 if c=c′ and 0otherwise. This measure was also used in Teacy.

The most likely-to-ground truth ratio (MGR) is defined as

$\begin{matrix}{{MGR}\overset{.}{=}\frac{\arg\mspace{14mu}{\max_{c^{\prime}}{{\mathbb{P}}\left( c^{\prime} \middle| \mathcal{H} \right)}}}{{\mathbb{P}}\left( c \middle| \mathcal{H} \right)}} & (43)\end{matrix}$

for ground truth class c. Roughly, this measure penalizes highconfidence in the wrong class. In a way it “demands” ground truth classto be most (possibly, equally) likely.

Simulation Experiments

Statistics (over realizations) for the three algorithms were collectedfor several scenarios. In each scenario, GP models were created forthree classes, by manually specifying classifier response for chosenrelative locations around the origin (i.e. locations assumed to be inobject-centered coordinates) in the 2D plane, as indicated in FIG. 1.Note that a GP model for a class describes classifier responses for allclasses, (see Eq. (15)).

During each simulation, the robot moved along a pre-specified trajectoryand observed a single object from different viewpoints, as indicated inFIG. 2. At each time step the algorithm received new classifiermeasurements and updated pose belief (simulating a SLAM solution).Classifier measurements were generated using the GP model of a “groundtruth” class (the simulation of measurements is detailed in the nextsubsections), which needs to be inferred by the algorithm using themeasurements.

Model Uncertainty Scenario

Model uncertainty expresses the reliability of the classifier output.High model uncertainty corresponds to situations where classifier inputsare far from training data, often due to an unfamiliar scene, object orviewpoint pictured, causes output that may be arbitrary. We simulatedthis with two steps, performed at each time instant: first, a nominal“true” measurement s^(nominal) is generated from the GP model of theground truth class.

The level of model uncertainty σ_(u) ² was selected at each time stepuniformly between 0 and σ_(max) ² (a parameter). This level was thenused as a standard deviation of a Gaussian centered at the truemeasurement, to generate a simulated noised measurement s^(noised). TheModel Based and Simple Bayes algorithms receive s^(noised) as aclassification measurement and are not “aware” of the uncertainty. Theprocedure for simulating classifier outputs for the simulation at eachstep is described algorithmically here:

Input: S_(0:k−1),  

 _(0:k) ^((rel)), σ_(max) ², N_(samples) 1: s^(nominal) ~  

 (s | c, S_(0:k−1),  

 _(0:k) ^((rel)))  

 See Eq. (30) 2: σ_(u) ² ~ Uni(0, σ_(max) ²)  

 Choose uncertainty level 3: s^(noised) ~ N (s^(nominal) , σ_(u) ²I)  

 Uncertain classification 4: samples ← θ 5: for N_(samples) times do  

 Simulating dropout 6:  s ~ N (s^(noised) , σ_(u) ²I) 7:  samples ←samples ∪ {s} 8: end for 9: return s^(nominal), s^(noised), samples

As indicated, the procedure included receiving samples (simulatingoutputs of several forward passes applying dropouts) drawn from aGaussian distribution centered at s^(noised) with standard deviationσ_(u) ². The first simulation showed the effects of considerable modeluncertainty, with no localization errors (perfect localization).

FIGS. 3A-C show plots of a GP model of ground truth class and simulatedclassifier measurements (s^(noised)) while measurements are being made,while a robot moves along a path. FIG. 3A shows the Gaussian processmodel for ground truth class and simulated (noised) classifiermeasurement over the course of a robot's trajectory, showing plots ofresponse for a 1st label against response for a second label. FIGS. 3Band 3C show first and second components over time indices, respectively.

FIGS. 4A-F show the statistics described above, with a probabilityassigned to a ground truth class and Eqs. (42-43) along with percentiles(over scenario realizations) as patches of varying saturation, with a10% step: the median is plotted darkest, the next lighter patchindicates runs between 40th and 60th percentile, the next indicates runsbetween 30th and 70th, etc. The areas above and below the plots containthe top and bottom 10% of the runs respectively. FIGS. 4A, 4C, and 4Eshow comparisons of the method of the Model Based (green) to the methodof the present invention (“our method”), for the respective metrics ofprobability of correct class, the mean likely-to-ground truth ratio(MGR), and the mean squared detection error (MSDE). FIGS. 4B, 4D, and 4Fshow comparisons of the method of Simple Bayes (red) to the method ofthe present invention (“our method”) comparisons of the method of theModel Based (green) to the method of the present invention (“ourmethod”), also for the respective metrics of probability of correctclass, MGR, and MSDE. The legend in the leftmost column shows thepercentage of time steps where the most likely class was the correctone.

In comparison with the method Model Based, results for the presentmethod were more concentrated (FIGS. 4A, 4C, 4E), which means that themethod results were more stable. For example, in more than 20% of theruns (bottom lightest shade and the white area below that), theprobability of a correct class for Model Based in time step 15 was lessthan 0.2 (compared to more than 0.33 for ours). In more than 20% of theruns the MGR for Model Based at iteration 15 was higher than 1, whichmeans that a wrong (most likely) class was assigned probability morethan twice higher than the correct one, i.e. the wrong class was chosenwith a high confidence. The MSDE plot displays similar behavior. InFIGS. 4B, 4D, and F, a drop in the accuracy of the Simple Bayes aroundtime step 15 was the result of an “inverse” measurement in the model,meaning that from a certain angle, the classifier response suggested adifferent class. This illustrates the difference between our method andthe other methods, which match the entire sequence of measurementsagainst a model, and thus can use also “inverse” measurements toclassify correctly (on the downside, requiring a class model).

Localization Uncertainty Scenario

In methods making use of spatial class models, localization errors maycause classification aliasing when acquired measurements correspond tothe model of a wrong class, because of the spatial shift in the query.To exemplify this, in a “Localization Uncertainty” scenario, weintroduced a constant bias in an easting coordinate (where the robotmoves eastward in a straight line), causing aliasing between models(with no model uncertainty). FIG. 5A shows a GP mean of the correctclass model (blue) and classifier output over a robot track (red). Italso shows the GP mean of the model of a wrong class (yellow).

In FIG. 5B, classifier outputs for label 2 (red) are compared withoutlocalization bias against the corresponding GP component of the groundtruth class model (blue), showing a clear match. FIG. 5C shows thatafter introducing a bias of −8 units in easting, classifier responses(red) were matched against shifted spatial models, making the wrongclass (yellow) a more likely match until approximately time step 16,after which the blue line matched correctly in spite of the shift.

The effects of this on performance are shown in FIGS. 6A-F. Our methodaccumulates classification evidence rather than prematurely emitting an(over)confident possibly wrong classification decision. By contrast, theModel Based method infers the wrong class with high confidence (as canbe seen in the MGR plots, FIGS. 6C and 6D) peaking approximately at timestep 15, after which disambiguating measurements start to arrive. As canbe seen in FIG. 6D, Simple Bayes method performs well, closely followingthe line from the respective FIG. 5 graph, because the classifiermeasurements are stable and not ambiguous. Note that aliasing occurswhen trying to match against the different models.

The classification observations were generated as follows: a GP model ofthe ground truth class along with previous classification measurementswere used to generate a “ground truth” measurement. A “noised”measurement was generated from a Gaussian centered there at (fed toModel Based and Simple Bayesian schemes), and a cloud of measurementswas generated around the noised measurement to reflect the distance fromthe ground truth (fed to our scheme). The standard deviation forgeneration of the noised classification measurement was uniformly pickedat each time step from a range defined per experiment.

Processing elements of the system described herein may be implemented indigital electronic circuitry, or in computer hardware, firmware,software, or in combinations thereof. Such elements can be implementedas a computer program product, tangibly embodied in an informationcarrier, such as a non-transient, machine-readable storage device, forexecution by, or to control the operation of, data processing apparatus,such as a programmable processor, computer, or deployed to be executedon multiple computers at one site or one or more across multiple sites.Memory storage for software and data may include multiple one or morememory units, including one or more types of storage media. Examples ofstorage media include, but are not limited to, magnetic media, opticalmedia, and integrated circuits such as read-only memory devices (ROM)and random access memory (RAM). Network interface modules may controlthe sending and receiving of data packets over networks. Method stepsassociated with the system and process can be rearranged and/or one ormore such steps can be omitted to achieve the same, or similar, resultsto those described herein. It is to be understood that the embodimentsdescribed hereinabove are cited by way of example, and that the presentinvention is not limited to what has been particularly shown anddescribed hereinabove.

1.-4. (canceled)
 5. A method of classifying an object appearing inrobotic observations of a scene obtained from multiple sequentialviewpoints, the method comprising: from a training set of sequentialobservations and corresponding viewpoints, generating, for each class ofa set of classes of objects, a viewpoint-dependent class modelrepresenting each class as a Gaussian process, wherein each class modelincludes spatial correlations between the viewpoints in classifiermeasurements for the corresponding observations; subsequently: A.acquiring a history H_(k) of multiple sequential robotic observationsand controls up to a time k, including an observation at time k, denotedas z_(k); B. applying a classifier to generate from z_(k) a set S_(k) ofclassifier measurements, wherein the distribution of S_(k) represents amodel uncertainty of the classifier, wherein each classifier measurements_(k) of the set S_(k) is a vector, each component of which indicates alikelihood of an object belonging to a class represented by thecomponent; C. sampling a trajectory X_(0:k) and a pose o of an objectfrom a joint pose distribution of trajectories and object poses, giventhe history H_(k); D. sampling a classifier measurement s_(k) from theset S_(k); E. for each component s^((i)) of the sampled classifiermeasurement s_(k), applying the class model for a class c represented bythe component to calculate a likelihood of s^((i)) belonging to theclass c, given observations prior to time k; F. from the likelihood ofeach s^((i)), generating a likelihood value P(s_(k)|c,H_(k)\{z_(k)}),representing a likelihood value of s_(k) belonging to a given class c,wherein the likelihood value of s_(k) is calculated by a functionemploying the spatial class model of the given class c and a modeluncertainty in past classifier measurements; G. generating, from thelikelihood value of s_(k), a normalized class likelihood valueP(c|s_(k),H_(k)); H. generating an average class likelihood,P(c|H_(k(i))), for the given class c, given H_(k) with respect to thesampled trajectory, by repeating the calculation of the normalized classlikelihood value for multiple samples s_(k), given the sampledtrajectory; I. generating, for multiple sampled trajectories of X_(0:k),multiple respective average class likelihoods; and J. averaging theaverage class likelihoods to generate a weighted average classlikelihood P(c|H_(k)) for the given class c, given H_(k) (step 14),indicative of a probability of the object belonging to the given classc, given trajectory uncertainty, viewpoint-dependent variations, andmodel uncertainty of the classifier.
 6. The method of claim 5, whereinthe normalized likelihood P(c|s_(k),H_(k)) is calculated from thelikelihood value of s_(k) by calculating, at an interim step, a weightedvalue {tilde over (h)}^(c)≐P(s_(k)|c,H_(k)\{z_(k)})·P(c|H_(k−1))^((i)),wherein P(c|H_(k−1))^((i)) is saved from a previous time step k−1. 7.The method of claim 6, wherein P(c|s_(k),H_(k)) is calculated as {tildeover (h)}^(c)/Σ_(c){tilde over (h)}^(c), where the denominatorΣ_(c){tilde over (h)}^(c) is a sum taken over a set of multiplecandidate classes c.
 8. The method of claim 5, wherein the likelihoodvalue of s_(k) is calculated as a marginalization of a Gaussiandistribution P(S_(0:k)|c,H_(k)\{z_(k)}), wherein s_(0:k)≐(s₀, s₁, . . ., s_(k)) is a set of classifier measurements taken at sequential times 0through k.
 9. The method of claim 8, wherein the Gaussian distributionhas a mean μ_(J) and a covariance matrix Σ_(J) defined as:Σ_(j)=(Σ_(s) ⁻¹+Σ_(z) ⁻¹)⁻¹, μ_(J)=Σ_(J) ⁻¹·(Σ_(s) ⁻¹μ_(s)+Σ_(z)⁻¹μ_(z)) where the terms Σ_(s), representing a covariance matrix, andμ_(s), representing a mean vector, are obtained through applying theclass model to the trajectory sample X_(0:k) and to the object posesample o, and where the terms Σ_(z), representing a covariance matrix,and μ_(z), representing a mean vector, are constructed from sets ofclassifier measurement samples, such that μ_(s) and μ_(z) respectivelyrepresent the vectors: $\begin{matrix}{{\mu_{s} = \begin{pmatrix}\mu_{0} \\\vdots \\\mu_{k - 1} \\\mu_{k}\end{pmatrix}}\ } & {\mu_{z} = \begin{pmatrix}\mu_{z_{0}} \\\vdots \\\mu_{z_{k - 1}} \\0\end{pmatrix}}\end{matrix}$ and wherein Σ_(s) ⁻¹ and Σ_(z) ⁻¹ respectively representthe matrices: $\Sigma_{s}^{- 1} = {\begin{bmatrix}{- \Omega} & I\end{bmatrix}^{T}{\Sigma_{{k|0}:{k - 1}}^{- 1}\begin{bmatrix}{- \Omega} & I\end{bmatrix}}}$ and   $\Sigma_{z}^{- 1} = \begin{pmatrix}\Sigma_{z_{0}}^{- 1} & 0 & \ldots & 0 \\0 & \ddots & \; & \vdots \\\vdots & \; & \Sigma_{2_{k - 1}}^{- 1} & 0 \\0 & \ldots & 0 & 0\end{pmatrix}$
 10. The method of claim 9, wherein the steps A-J areperformed iteratively at multiple sequential time steps, and whereinvector μ_(z) and matrix Σ_(z) ⁻¹ are calculated prior to the steps A-Jat each time step.
 11. The method of claim 5, wherein the set ofclassifier measurements S_(k) is obtained by applying Monte-Carlodropout to the classifier with the observation z_(k) as input, for eachclassifier measurement s_(k).
 12. The method of claim 10, wherein theset of classifier measurements S_(k) is obtained by applying Monte-Carlodropout in a single forward pass.
 13. The method of claim 5, wherein thesteps A-J are iterative steps performed at sequential time steps, andwherein the Gaussian processes that represent the classes are fit totheir respective classes one time prior to the iterative steps.